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OPERATOR APPROACH TO VALUES OF STOCHASTIC GAMES WITH 

VARYING STAGE DURATION 

SYLVAIN SORIN AND GUILLAUME VIGERAL 


Abstract. We study the links between the values of stochastic games with varying stage dura¬ 
tion h, the corresponding Shapley operators T and = hT + (1 — h)Id and the solution of the 
evolution equation ft = (T — Id)/*. Gonsidering general non expansive maps we establish two 
kinds of results, under both the discounted or the finite length framework, that apply to the class 
of “exact” stochastic games. First, for a fixed length or discount factor, the value converges as 
the stage duration go to 0. Second, the asymptotic behavior of the value as the length goes to 
infinity, or as the discount factor goes to 0, does not depend on the stage duration. In addition, 
these properties imply the existence of the value of the finite length or discounted continuous 
time game (associated to a continuous time jointly controlled Markov process), as the limit of 
the value of any discretization with vanishing mesh. 


1. Introduction 

The operator introduced by Shapley m to study zero-sum discounted stochastic games is a 
non expansive map T from a Banach space to itself. Several results have been obtained by using 
a similar “operator approach” in the framework of zero-sum repeated games, US], [8], US], m- 
In particular the analysis extends to general repeated games (including incomplete information 
and signals, see [B] Chapter IV) and arbitrary evaluation of the sequence of stage payoffs. 

An important part of the literature studies families of evaluations with vanishing stage weight 
(either length going to infinity or discount factor going to 0) and the main issue is the existence 
of an asymptotic value. Assuming that the stage duration is one, each evaluation induces a time 
ponderation on M"*" and vanishing stage weight leads to an increasing number na of interactions 
during any given fraction a g] 0, 1[ of the game that has been played according to this ponderation. 

We consider here another direction of research: the time ponderation is fixed and the stage 
duration vanishes (leading to a continuous time game at the limit). Note that, as above, this 
leads to an increasing number Ua of interactions. 

We study in particular stochastic games with varying stage duration, in the spirit of Neyman 
m- Our approach is based on the non expansive property of the Shapley operator to derive con¬ 
vergence results, characterization of the values, and links with evolution equations in continuous 
time. 

The structure of the paper is as follows: 

We first recall the definition of the Shapley operator T associated to a stochastic game (Section 
2) and the related finite and discounted iterations. 

We introduce in Section 3 two models of stochastic games with variable stage duration h : lin¬ 
earization via “exact” games, and “discretization” of a continuous time model. In both frame¬ 
works we describe the link with the fractional Shapley operator 

Sections 4 and 5 are devoted to the abstract analysis of various fractional iterations of a general 
non expansive map T: 

- first in the finite iteration case, where we establish relations between the n-iterate and the 
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solution of the evolution equation /^ = (T — Id) ft at time t = nh, 

- then in the discounted case, where we identify the A-discounted evaluation associated to T/j as 

the /i = -discounted evaluation associated to T. 

We then apply these results to the case of exact stochastic games. Section 6 and 7 are respectively 
devoted to the study of games with hnite length and with discount factor. In both frameworks 
we establish results of two different kinds. Firstly, for a fixed evaluation (hnite length or discount 
factor), the value of a game with varying stage duration converges as the stage duration goes to 
0. Secondly, the asymptotic behavior of the value (for large length or small discount factor) does 
not depend on the stage duration. 

In Section 8 we study the discretization of the continuous time game by approximating with exact 
games and we prove convergence of the values in the hnite length and discounted case as the stage 
duration vanishes. 

The last section provide concluding comments. 

2. Stochastic games and Shapley operator 

Consider a two person zero-sum stochastic game G with a hnite state space 11. I and J are 
compact metric action spaces, X and Y are the sets of regular probabilities on the corresponding 
Borel cr-algebra. 5 is a bounded measurable payoff function from x / x J to M (with multilinear 
extension to X xY) and for each (*, j) G I x J, P(i,j) is a transition probability from Q to A(fl) 

(the set of probabilities on fl). g and P are separately continuous on I and J. 

The game is played in stages. At stage n, knowing the state cOn, player 1 (resp. 2 ) chooses 
in G I (resp. jn G J), the stage payoff is gn = jn)- The next state Un+i is selected 

according to the probability P{in, jn)[‘^n] and is announced to the players. 

One associates to G a Shapley operator, see Shapley which is a map T from F = to 
itself: f G F ^ T(/) dehned by 

(1) T(/)(a;) = val {g{uj-,x,y) + P{x,y)[uj] o f}^ ^uigU 

(x,y)eXxY 

where val is the max min = min max = value operator on A x T, 

XxY 

P{x,y)[u}]{u}') = P{iJ)[u;]{u}')x{di)y{dj) and tov R gM.^, Ro f = EceD^(C)/(C)- 

Note that T is a non expansive map. Moreover T is monotone and translates the constants (for 
a converse result see, e.g., Kolokoltsov [5], and Sorin m for related consequences) but we will 
not use here these additional properties. 

One can consider two other frameworks with standard Borel, where T is dehned in a similar 
way with P{x,y)[uj] o f = f{(^)P{x,y)[u}]{d() and where F is either: 

- the set of bounded measurable functions on Q and P{i, j)[Lj]{A) is separately continous in (i, j) 
for each Borel subset A cH (see [ 6 ], Prop. VII.1.4), 

- or the set of bounded continuous functions on 12 and both maps (x, w) I{x, y; oj) = f(()P(x, y)[uj]{dC) 

and {y,u}) eA I{x,y;uj) are continuous for any bounded continuous function / on 12 (see [B], Prop. 

VII.1.5). 

For more general conditions see Nowak m, m- 

From now on we assume that one of these cases holds so that T is well dehned from some 
Banach space F to itself. 

Recall that I 4 = T”'(0) is the value of the n-stage game with total evaluation 9^ ^ 

function of the initial state) so that the normalized value is 

Wx, which is the unique hxed point of / 1 —)■ T((l — A)/) on F, is the un-normalized value of the 
discounted game with total evaluation J2m=i 9m{^ — A)"*“^ and the normalized discounted value 
is Wx = XWx- 
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3. Stochastic games with varying stage duration 

Let us introduce, for each {i,j) G I x J, the kernel Q{i,j) such that P{i,j) = Id + Q{i,j) and 
write G = {g, Q) for a stochastic game defined as above. One introduces two families of varying 
stage duration games, see Neyman m, associated to G. 

3.1. Exact sequence. 

Consider G as a game with stage duration one. Given a step size h G (0,1], define an “exact” 
game G^ with stage duration h, stage payoff hg and stage transition = Id + hQ. That is, 
G>^ = {hg,hQ). 

Gh 

appears as a linearization of the game G. During a stage of duration h both the payoff and 
the state variation are proportional with factor h to those of a stage of duration one. 

Definition 3.1. Given h G [0,1], let = (1 ~ h)Id + h T. 

Then one has: 

Proposition 3.1. 

//T is the Shapley operator ofG, then T/i is the Shapley operator of the game G^. 

Proof. Since T/i(/) = (1 — h)f + h valjgr + P o /} = (1 — h)f + val{/i g + h{Id + Q) o /}, one 
obtains 

( 2 ) Th{f)=va.l{hg + Phof} 

with Ph = Id + hQ. 

Hence T/j is the one stage operator associated to the game G^,. □ 

We will consider the associated finitely repeated games and discounted games asociated to G^. 
Natural questions are, in the finite case : 

1) given a total length M, what is the asymptotic behavior of the value of the Wstage game with 
stage duration h, as h vanishes and Nh = M. 

2) what is the asymptotic behavior of the value, as Nh goes to oo, 
and similarly in the discounted framework. 

These topics will be addressed in the general setting of a non expansive map T in Sections 4 
and 5. In both cases we will obtain explicit formulations for the limits. 

3.2. Discretization. 

Let G = {g, Q) be a stochastic game with a finite state space. We consider here a continuous 
time jointly controlled Markov process associated to the kernel Q. 

Explicitly, define P*(i, j) as the continuous time homogeneous Markov chain on D, indexed by 
with generator Q{i,j): 

(3) j) = p*(l j)Q(Li)- 

Given a stepsize h G (0,1], G^ has to be considered as the discretization with mesh h of the game 
in continuous time G where the state variable follows P* and is controlled by both players, see 

m, EB, m, 0- 

More precisely the players act at time s = kh hy choosing actions {is,js) (at random according 
to some Xs, resp. ys), knowing the current state. Between time s and s + h, the state ut evolves 
with conditional law P* following ([3|) with Q{is,js) and P^ = Id. 

The associated Shapley operator of this stochastic game is T/j with 

T;,(/)= val{/ + P'*o/} 

XxY 

where g^{ujQ,x,y) stands for g{uJt]x,y)dt] and P^{x,y) = Jj^jP^{i,j)x{di)y{dj). 

The corresponding finitely repeated and discounted games will be analyzed in Section 8. 


4 


SYLVAIN SORIN AND GUILLAUME VIGERAL 


4. Finite iterations of non expansive maps and evolution equations 

Consider a non expansive map T from a Banach space Z to itself. In this section we recall 
basic results concerning its iterations and the corresponding discrete and continuous dynamics. 

4.1. Finite iteration. 

The n-stage iteration starting from z G Z is = ^^{z) hence satisfies 

Un-Un-l = -{Id-T){Un-l) 


which can be considered as a discretization of the differential equation 

(4) ft = -{Id-T)ft, fo = z. 

(Note that this is a special case of the differential inclusion ft G —Aft, for the accretive (maximal 
monotone) operator A = Id — T.) 

The comparison between the iterates of T and the solution ftiz) of the differential equation 
(j4|) is given by the generalized Chernoff’s formula m, 12], see, e.g., Brezis [T], p.l6: 

Proposition 4.1. 

(5) II/, (2) - T”(2))|| < ||2 - T(2) II yt + («-«)/ 


In particular with z = 0 and t = n, one obtains 


( 6 ) 


|/n(0) 


- V„ 


n 


< 


|T(0)|| 


n 


where as before, T"’(0) = Vn = nvn- 

Given h G (0,1], a change of time shows that ft/hiz) is the solution of 


(7) 


9t = - 


{Id - T)gt 
h 


go 


z. 


4.2. Interpolation. 

Given h G [0,1] introduce again: 

( 8 ) Th = il-h)Id + hT. 

Then using ([7|) which is 

gt = -iId-Th)gt, go = z, 

one obtains from ([5]) 

(9) WMz) - Tl{z)\\ < ||z - Tz\Wth + {nh-tf, 
hence in particular with h = 4 

(10) ||/,{2)-T”/„(2)||<||2-T2||^, 

or 

( 11 ) \\fnh{z)-Tl{z)\\ < Wz-TzWhV^. 
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4.3. Eulerian schemes. 

More generally for a sequence of step sizes {/ifc} iu [0,1] one defines inductively an Eulerian scheme 
{zk} by 

Zk+i 


or 


Zk+l 

For two sequences {hk},{h£} in [0,1], with associated Eulerian schemes 


Zk+l 

Vigeral [23] obtains 
Proposition 4.2. 

(12) \\ze - ZkW < \\zo - z\\ + \\zo - 2 :|| + Hz - Tz\\ \/(cfc - diY + Tk + n, 

(13) \\ft{z) - Zk\\ < 11^ - Tz|| {ak - ty + Tk, 

with zo = z, ak = ELi K , Tk = Ei=i = Ej=i ^ n = Ej=i 


Vz G z, 


In particular this gives, in the uniform case hi = h,Vi 

(14) II/,( 2 ) - Tl[z))\\ < ||2 - Tz\Wnh^ + {nh - tf 
and coincides with (fTO)) and m at t = nh. 

4.4. Two approximations. 

Equation Q or more generally (|13p corresponds to two approximations: 

i) Comparison on a compact interval [0, M] of // to the linear interpolation of m = 

0, ...,n, which is, using COD : 

||//(0)-t;J/f(0)|| <iF^, VtG[0,M], 

for some constant K. 

Or more generally if one considers a sequence of step sizes {hi},i = 1, ...,k with ak = E^=i = 
M, /i* < h, Vi and IljT/i. = o ■ ■ ■ o one has: 

(15) ||/m( 0) - n/T;,,(0)|| < KVhM. 


Thus the composite iteration IljT/i. converges to the solution of (CD as the mesh h goes to 0. 

ii) Asymptotic comparison of the behavior of ft, solution of (CD and iterations of the form 
IljT/j. with step size hi <1 and total length ak = t: 

(16) ||//(0)-n/T/,,(0)|| 

5. Discounted iterations of non expansive maps 

5.1. General properties. 

For A G (0,1] denote again by W\ the unique fixed point of z i-)- r((l — A)z) and let w\ = AVFa- 
We recall some basic evaluations, see e.g. [22]. 

Proposition 5.1. 
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Proof. First, one has: 

IIVFaII - ||T( 0 )|| < IITFa - T( 0 )|| = ||T ([1 - AJ^Fa) - T( 0 )|| < (1 - A)||tFA|| 

which implies A||1 Fa|| < ||T( 0 )||. 

Moreover : 


\\Wx-W4 = ||T([1-A]1Fa)-T([1-^]F/^)|| 

< \\[l-X]Wx-[l-fi]W4 

< (1-A)||1Fa-1F^|| + |A-^||||W^|| 

thus 

AIITFa - W,\\ < |A - ^IIIIIF^II < ^^||T(0)||. 

On the other hand: 

IIr^a - < ^IIW"a - TF^II + |A - 

hence 

\\wx -w^W < 2|A - ^IIIIPF^II 

and the result follows from \\W^\\ < ||T(0)||///. □ 


5.2. Discounted values. 

For any non expansive operator T on Z and h G (0,1], introduce 

= T;,((l - Xh)W^) 

as the unique fixed point point of u i-A T/j((l — Xh)u) and define 

(17) w'l = AH'* = ATi(i^«,*), 

is the un-normalized A-evaluation computed through a stage of duration h using the lin¬ 
earization T/i of T and w\ is the associated normalization. 

Recall that for h = 1, Th = T and wx = w^. 

Proposition 5.2. 

“■* = 

Proof. By definition of T/^, 

1 — \h 

w\ = A((l-l.)M+ftT)(^^u,'*) 

1 — \h 

= {l-h){l-Xh)wi + XhT{ — ^wi). 

A 

Hence 

(1 + A - \h)wl = AT(i^«,*) 

which is 

U.S = 

for M = i+x-xh - 

The non expansiveness of T yields uniqueness, hence the result. □ 
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5.3. Vanishing duration. 

Introduce D^, the auxiliary one stage operator associated to the A-discounted evaluation of T/j, 
defined by 

^ - \h 

n<lz = \T,[i^—)z] 

which is (1 — Xh) contracting. 

In particular = w\ and for /i = 1, w\ = w\. 

Proposition 5.3. For any z ^ Z 


and w\ ^ w \ as h ^ 0 with 
^ T+x 


w\ = lim (DS)“z 

n^+cxD 


\\wx — w X II < CXh. 

" ^ t+a" - 


Proof. The first equality follows from definition ()17p . 
By Proposition 15.11 


Wx — w X 
l+x 


< Cl||l- 

< CXh. 


1 + A — Xh I 

1 + A ’ 


□ 


More generally one can consider a sequence of step sizes {hi} with hi < h and J2i — +c>o 
and the associated operator IljD^V 

Lemma 5.1. For any z & Z and any sequence hi, ■ ■ ■ , hn, 

IlntiD^^z) -w^\\< 2||T(0)|| max h, + (||T(0)|| + ||z||)nti(l - Xhi). 

l+A l<i<n 

Proof. By non expansiveness, 

||nr=iD^'(z) - ntiD^*(u;^)|| < ||z - || nti(i - Xhi) < (||T(0)|| + ||z||) nr=i(i - Xhi). 

Hence it is enough to show that l|nr_iD^"(w a ) — rc^|| < 2||T(0)|| maxi<j<„/ij. 

l+A l+A - 

Let dk = ||nf_. a ) — w x ||. Then 

II A \ II 


4 < ||n4,D^*(R;^)-D^4R;^)|| + ||D^4u;^)-R;^| 

l+A l+A l+A l+A 

\^k I 


\^k / 


< (1 — Ahi.)4-i-i + IID^*'(rc A ) —w A I 

^ l+A l+A 


Now, for any /i, 

l|D^(r(;^) — I 


l+A 


l+A 


= 11(1 — /i)(l — Xh)w^_ + XhT{ —-— I 

l+A X l+A l+A 

< Xh‘^\\w^ II + ||A/iT(— 7-—W X ) — h{l + X)w^ || 

l+A A l+A l+A 

= Xh‘^\\w^\\ + ||A/iT(^^— T^w X ) — XhT(^w x ill 

l+A X l+A X l+A 

< 2Xh‘^\\w A II 

l+A 

< 2||T(0)||A/i2. 

Hence 

4 < (1 ~ A/ifc)4+i + 2||T(0)||A/i| 

= (1 — A4)4+i +-^4(2||T(o)||4) 

< max(4+i,2||T(0)||4). 

Since 4+i = 0 we get di < 2||T(0)|| maxi<j<„ 4 as claimed. 


□ 
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In particular one gets 

Proposition 5.4. For any z £ Z, and any sequence {hi} with hi < h and = +oo, 

||n-iD^'(z)-R; . ||<2||T(0)||/i. 

1 +A 

5.4. Asymptotic properties. 

An easy consequence of ProDosition l5.2l is that for a given h, w\ has the same asymptotic behavior, 
as A tends to 0, as tc a. 


Proposition 5.5. 

Proof. By Proposition 15.11 


||u;^-u;a|| <2CA. 


W\ - WX 


= 2C|1-(1 + A-A/i| 
< 2CA. 


□ 


To generalize this property in order to apply it to games with varying duration we need an 
additional assumption on the operator T. 

Definition 5.1. The operator T satisfies assumption (H) if there exists two nondecreasing func¬ 
tions k :]0,1] —)• M"'' and £ : [0, +oo] with k{X) = o(\/A) as A goes to 0 and 

\\B\{z)-T>l{z)\\<k{\X-pi\)£{\\z\\) 

for all (A, p) €]0,1]^ and z G Z. 

Proposition 5.6. IfT satisfies (H) then for any z & Z and any sequence {hi} with Y}i hi = +oo, 
||n^j^D^*(^) — wx\\ goes to 0 as X goes to 0. 


Proof. Since is 1 — A/i contracting and Yli = +oo, is independent of z and one 

may assume z = w\. 

Define dn = ||n”^;^D^'(r(;A) — R’aII hence do = 0 and 

dn < ||ntiD^^K)-D^'*(u;A)|| + ||D^"(u;A)-R^A|| 


\hi I 


< (1 - A/in)(in-l + ||D^'*(u;a) - t(^A||- 


For any h, 


|Da(w'a) - R^aII < 


— h){l — Xh)w\ + AhT(— T—wx) — wx\ 


= h{l + X-Xh) 


X 


1 + A — A/i 


A 

, 1 — Ah 
' A 


wx) - wx 


= h(l + A - Ah) ||D^(u;a) - Dj^(u;A)|| with ^ = 


A 


1 T A — Ah 


Hence 


< h(l + A — Ah)l’(||r(;A||)h 

< 2h£(||r(0)||)h(A2). 

dn — (1 Xhn)dn—\ T Xhn 
< max f(i„_i,2£(||r(0)||) 


A^(l — h) 

1 + A — \h 


by (H) 


2^(||r(0)||) 


k{X^ 


h(A 2 ) 


and dn < 21’(||T(0)||)^^^ for all n. The result follows since by assumption h(A^) = o(A). 


□ 
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5.5. Invariant properties. 

We now consider another family of operators parametrized by a G [0,1]. 
Define for a G [0,1], Tq, by 

(18) TaZ = (1 — a)z + T(az). 


Thus Tq, is non expansive, hence for A g]0, 1] one can consider the associated A-discounted fixed 
point defined by 




Note that for a = 1, re" = iua. 

Proposition 5.7. 


= w^, with /i = 


A 


o T A — A o 


Proof. Direct computation gives 


5 “ = 


A 


Thus 


which is 


(a + A — Xa)wx = AT(a-^^— 

A 


<5" = 


for a = —r-A-r—, hence the result. 

^ Q+A—A a. ’ 

A 

Corollary 5.1. For A < 1/2, does not depend on A and equals Wi/ 2 - 


□ 


6. Finitely repeated exact games 
We consider the family of exact games = {hg,h Q) with h G [0,1]. 

6.1. Approximation of the valne. 

The recursive equation for the un-normalized value of the n-stage game (of total length 
nh) is given by: 

= val[hg{uj ].) + Ph{.)[uj] o 

so that 

F/* = ThVti = t;((0) 

Let / be the solution of dH) with T satisfying ([T]). Using the results of Section 4 in particular 
m we obtain: 

Proposition 6.1. 

There exists a constant L such that for all n and h G [0,1] 

\\Vjf - fnhm\ < LhV^. 
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6.2. Vanishing step sizes. 

The previous Proposition 16.11 shows that which is the normalized value of the n-stage 

game with length N = nh, satisfies: 


/y(0) 

N 



In fact Proposition 14.21 induces a more precise result for vanishing stage duration, that we now 
describe. 

Given t > 0 and a finite partition Ht of [0,t],fo = Oj = t, induced by step sizes {hi}, 1 > 
hi > 0,i = 1, ..., k, Yli<j hi = tj, we define its mesh as m{Ht) = max* hi. 

We consider the k stage game where the duration of stage i is hi. Let U{Ht) be its un-normalized 
value (the normalized value is u{Ht) = Thus U{Ht) = o • • • o T^'=(0). 

Definition 6.1. Vt is the limit value on [0,t] if for any sequence of partitions {Hf} of[0,t] with 
vanishing mesh, the sequence of values {U{Hf)} of the corresponding games converges to Vt. 

Proposition 6.2. 

There exists a constant L' such that for any Ht with m{Ht) < h, the un-normalized value U{Ht) 
satisfies 

\\U{Ht) - ftm\ < L'Vht. 

Thus the limit value Vt exists and is given by Vt = ft{0). 

Proof. The inequality is obtained from equation (fT^ with Uk = t and Tk < ht. 

The existence of Vt follows. □ 

The interpretation of these results is twofold; 

first the value of the game with finite length is essentially independent of the duration of the 
stages when this duration is small enough, 

second, this value is given by the solution of the associated differential equation (|4j). 


Note that Vt equals also the value of the continuous time game of length t introduced in Neyman 

M- 


6.3. Asymptotic analysis. 

A further consequence of Proposition 16.21 is that for any t and any /c-stage game associated to a 
finite partition Ht, with normalized value u{Ht), one has: 


Proposition 6.3. 

There exists L' such that for any Ht 


u{Ht) 


m L\ 

t ~ y/t' 


In particular the asymptotic behavior of the (normalized) value of the game depends only on its 
total length t (and not on the durations of the individual stages) up to a term 0(-^). Again the 
comparison quantity is given by the normalized solution of the associated differential equation dH) . 


7. Discounted exact games 

7.1. Values of discounted exact game. 

We follow the dehnition of Neyman (eq. (3) p. 254 in [10]) : the (normalized) value w\ of the A 
discounted game is the unique solution of 

Wx{w) = va.lxxY[hXg{uj,x,y) + (1 - h\)Ph{x, y)[u}] o w^]. 
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with Ph = Id + hQ. 

In particular, for /i = 1 one recovers w\ = w\ (see 2.5) associated to T defined in (fT|). 
The notation is consistent with the previous Section 5 since one has 

Proposition 7.1. corresponds to the solution of 

Proof. 


< = AT;,(: 


A 


-wf) = X[va.lxxY[hgiuj,x,y) +Phix,y)[uj] o 


A 


-<]■ 


Hence w’f = X[valxxY[hXg{(.o,x,y) + (1 - hX)Phix,y)[u}] o w’f]. 

A direct computation using P^ = Id + hQ gives 
Proposition 7.2. w\ is the only solution of 

(p{uj) = va.lxxY[g{ui,x,y) + Q{x,y)[u}] o ip]. 


□ 


A 


We now apply the results of Section 5 

Proposition 7.3. 


= Wfj,, with p = 


X 


1 + A — X h 


Proof. Apply Proposition 15.21 


□ 


7.2. Vanishing duration. 

We now recover the convergence property in [10] . 


Corollary 7.1. 

For a fixed X, converges as h goes to 0. The limit, denoted w\, equals w^, hence is the only 
solution of: 


l+A 


p = val[5+ y o p\. 


(19) 

Moreover, \\w^ ~ w^aII < CXh. 

Proof. For the convergence, apply Proposition 15.31 
By definition satisfies 


l+A 


that is 


= va.l[-^g + -^{Id + Q) o 
l+A l+A l + A l+A 

w A = valk + ^ o w \ ]. 
l+A A l+A 


□ 


More generally consider a sequence of stage durations {hi} with hi < h and '^ihi = +oo in¬ 
ducing a partition H. The value of the associated A-discounted game is given by (0) 

hence satisfies 


Proposition 7.4. 


IIVPa^-iBaII <2||T(0)||/i. 

Proof. Apply Proposition 15.41 □ 

Once again wx has to be interpreted as the A-discounted value of the continuous time game, 
see d], [9], [To]. 

Note that our game theoretic framework is very general, in particular there is no finiteness 
assumption on the actions or states. 
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7.3. Asymptotic behavior. 

The value of the A discounted game with stage duration h has the same asymptotic behavior, 
as A tends to 0, as tc a. 

Proposition 7.5. 

IIiu^-iuaII <2CA. 

Proof. Apply Proposition 15.51 □ 

More generally one obtains 

Proposition 7.6. For any {hi} with inducing a partition H, \\W^ — icaH < C'X 

where C’ depends only on the game.. 

Proof. Immediate consequence of Proposition 15.61 and its proof, since, by non expansiveness of 
the value operator, for any game with a payoff bounded by C the associated Shapley operator T 
satisfies assumption (H) with k{X) = X = o(\/A) and ^(||^;||) = C + || 2 :||. □ 

7.4. Invariance properties. 

Let T be the Shapley operator associated to the game {g,Q). Then Tq, defined by (fTHjl is the 
Shapley operator associated to {g, a Q) since 

Ta(/) = va.lxxY[gi(^,x,y) + {Id + Q{x,y))[ui]oaf] + {1 - a)f 
= va.lxxY[g{^,x,y) + {Id + aQ{x,y))[u] of]. 

This implies 

Proposition 7.7. For any kernel R, the X-discounted value of the game G{g\ is inde¬ 

pendent of X < 1/2 and the only solution of 

ip{uj) = va.lxxY[g{u},x,y) + R{x,y)[uj] o ip]. 

Proof. Apply Corollary 15.11 □ 

This shows a tradeoff between the size of the kernel and the discount factor. Taking into 
account Proposition 17.31 one derives an invariance property of the value on the product space: 
discount factor x stage duration x kernel: 

ValiX, h, R) = Val{ ^^^_^^ , 1, R) = Val{X, 1, 

Similar covariance properties were obtained in [S] and m- 

8. Discretization approach 

We consider now the game which corresponds to the discretization of the continuous time 
game. We will study two frameworks, like in the previous sections : either a fixed finite length 
or a fixed discount factor and we will analyse the behavior of the associated values as the stage 
duration h goes to 0 . 

8.1. Finite length. 

The un-normalized value of the n-stage game with stage duration h satisfies = (T/i)”(0). 
Similarly for varying stage duration, corresponding to a partition H, one gets a recursive equation 
of the form Vnit) = njT/i.( 0 ). 

Lemma 8.1. There exists Gq such that 
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Proof. By non expansiveness of the value operator, 

l|Tt{/)-T»(/)|| < ||fc9(.)-9V)ll + ll/lllla(-)-P'‘{-)ll 

since Ph = Id + h Q and = Id + hQ + h 0{h). □ 

Proposition 8.1. 

There exists C depending only of the game G such that for any finite sequence {hi)i<n in [0,/i] 
with sum t and corresponding partition H: 

— 1411 < C{'/ht + ht + ht^). 

In particular for a given t, Vnit) tends to Vt as h goes to 0. 


Proof. The value of any game with total length less than t is bounded by some Cit, independently 
of h. Hence non expansiveness of the operators as well as the previous Lemma 8.1 gives 

||FH(t)-n,T,^(o)|| = ||niT,,(o)-n,T,^(o)|| 


< 


i>2 


T,J^(0) -T,J nT,^(0) 


i>2 


+ 


i>2 


T,, nT,,( 0 ) -T,, nT,,( 0 ) 


i>2 


< 


nT;,,( 0 )- nT;,,( 0 ) 

i>2 i>2 


+ Cohi ( 1 + Cl y ^ hi 


Without loss of generality Ci > 1 hence by summation, 


i=2 


||HH(t)-niT;,^( 0 )|| < CoCi{l + t)^k 


2=1 


2 = 1 


< Ch{l + t)J2hi 

i- 

= C ht{l + t) 

for C = CqCi- Then Proposition 16.21 yields the result. 


□ 


Remark 8.1. For a given h, the right hand term is quadratic in t, hence we do not link the 

asymptotic behavior of the normalized quantity Vn — ^ ^nd of vt = However if n is a 
function of h converging slowly enough to infinity, the previous proposition can be used. For 
example for n{h) = (so that t{h) = ^), one has 

Ptih)-vt{h)\\ =0{Vh). 

8.2. Discounted case. 

We consider uniform stage duration h. The normalized value of the discretization with mesh 
h of the A-discounted continuous game satisfies the fixed point equation 


w\{uj) = valxxv 


[/ 


Ae ^^g{ut,x,y) + e ^^P^{x,y)[uj]o w\ 


Proposition 8.2. 

For a given \, w\ tends to w\ as h goes to 0. 
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Proof. The equations for and w^, as well as the non expansiveness of the value operator, give: 

rh 


\w^ — w’fW < Xh 


1 f' 


At, 




+ (l-A/i-e-"'^)|K|| 

< XO{h^) + - w^xW + 0{h^) + X^O{h^) 


_ pfe| 


hence for a fixed A, (1 — e 


\w. 


— w^W = 0{h?) and the result follows from Corollary 17.11 □ 


Similar properties were obtained in [lOj . 


For an alternative approach to the limit behavior of the discretization of the continuous model, 
relying on viscosity solution tools and extending to various information structures on the state, 
see pH] , 


9. Extensions and concluding comments 

9.1. Stochastic games: no signals on the state. 

Consider a finite stochastic game where the players know only the initial distribution m G A(fl) 
and the actions at each stage. 

The basic equation for the exact game with duration h is then 

T^fim) =v^.l[hg{m;x,y) + f{m* Ph{i,j))]. 

ij 

with [m* Ph{i, j)]{u}) = j)[z]{u}) being the image of the probability m by the kernel 

Phihj)- 

The equation 

T^ = hT + {l- h)Id 

does not hold anymore and T — Id has to be replaced by lim/i^g ™ The study of such 

games with varying duration thus seems more involved. 

9.2. Link with games with nncertain duration. 

Notice that = (1 — h)Id + hT is a particular case of an operator of the form Yhi onT'^^ai > 
0, ^ Oi = 1, which corresponds to some generalized iterate O [E] of T. Hence all the valnes 
computed in sections 4.3, 5.3 and so on, can also be seen as the value of some games with 
uncertain duration. See [23] for specific remarks in the particular case of V^. 

9.3. Oscillations. 

Several examples of stochastic games (either with a finite set of states and compact sets of actions 
|24| , or compact set of states and finite set of actions |26| ) were recently constructed for which the 
values Vn and v\ do not converge. Hence the values of the corresponding games with vanishing 
duration (and thus their limit as continuous time games) do not converge when t goes to infinity 
or A to 0. 

9.4. Comparison to the literature. 

The approach here is different from the one of Neyman [10] : the proofs are based on properties 
of operators and not on strategies. For example [10] shows that playing optimally in (|19l) will 
imply Corollary 17.11 

By comparison our tools consider only the values and apply to any non expansive map T. 
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9.5. Main results. 

The main results can be summarized in two parts: 

- for a given finite length (or discounted evaluation) the value of the game with vanishing stage 
duration converges thus defining a limit value for the associated continuous time game. Moreover 
the limit is described explicitly. 

- as the length goes to oo or the discount factor goes to 0, the impact of the stage duration 
goes to 0 and the asymptotic behavior of the normalized value function is independent of the 
discretization. 


References 

[1] Brezis H. (1973) Operateurs Maximaux Monotones et Semi-Groupes de Contractions dans les Espaces de 
Hilbert, North-Holland. 

[2] Brezis H. and A. Pazy (1970) Accretive sets and differential equations in Banach spaces, Israel Journal of 
Mathematics, 8, 367-383. 

[3] Guo X. and O. Hernandez-Lerma (2003) Zero-sum games for continuous-time Markov chains with unbounded 
transition and average payoff rates, Journal of Applied Probability, 40, 327-345. 

[4] Guo X. and O. Hernandez-Lerma (2005) Zero-sum continuous-time Markov games with unbouuded transition 
and discounted payoff rates, Bernoulli, 11, 1009-1029. 

[5] V.N. Kolokoltsov, V.N. (1992) On linear, additive, and homogeneous operators in idempotent analysis. Ad¬ 
vances in Soviet Mathematics 13, Idempotent Analysis, Ed.V.P.Maslov and S.N. Samborski, 87-101. 

[6] Mertens J.-F., S. Sorin and S. Zamir (2015) Repeated Games, Gambridge University Press. 

[7] Miyadera I. and S. Oharu (1970) Approximation of semi-groups of nonlinear operators, Tdhoku Math. Journal, 
22, 24-47. 

[8] Neyman A. (2003) Stochastic games and nonexpansive maps. Stochastic Games and Applications, Neyman 
A. and S. Sorin (eds.), NATO Science Series, C 570, Kluwer Academic Publishers, 397-415. 

[9] Neyman A. (2012) Continuous-time stochastic games, DP 616, GSR Jerusalem. 

[10] Neyman A. (2013) Stochastic games with short-stage duration. Dynamic Games and Applications, 3, 236-278. 

[11] Neyman A. and S. Sorin (eds.) (2003) Stochastic Games and Applications, NATO Science Series, C 570, 
Kluwer Academic Publishers. 

[12] Neyman A. and S. Sorin (2010) Repeated games with public uncertain duration process. International Journal 
of Game Theory, 39, 29-52. 

[13] Nowak A.S. (1985) Universally measurable strategies in zero-sum stochastic games. Annals of Probability, 
13, 269-287. 

[14] Nowak A.S. (2003) Zero-sum stochastic games with Borel state spaces. Stochastic Games and Applications, 
Neyman A. and S. Sorin (eds.), NATO Science Series, C 570, Kluwer Academic Publishers, 77-91. 

[15] Prieto-Rumeau T., Heruandez-Lerma O. (2012) Selected Topics on Continuous-Time Controlled Markov 
Chains and Markov Games, Imperial College Press. 

[16] Rosenberg D. and S. Sorin (2001) An operator approach to zero-sum repeated games, Israel Journal of 
Mathematics, 121, 221- 246. 

[17] Shapley L. S. (1953) Stochastic games, Proceedings of the National Academy of Sciences of the U.S.A, 39, 
1095-1100. 

[18] Sorin S. (2003) The operator approach to zero-sum stochastic games. Stochastic Games and Applications, 
Neyman A. and S. Sorin (eds.), NATO Science Series, C 570, Kluwer Academic Publishers, 375-395. 

[19] Soriu S. (2004) Asymptotic properties of mouotonic nouexpansive mappings. Discrete Event Dynamic Sys¬ 
tems, 14, 109-122. 

[20] Sorin S. (2015) Limit value of dynamic zero-sum games with vanishing stage duration, preprint. 

[21] Tanaka K. and K. Wakuta (1977) On continuous Markov games with the expected average reward criterion, 
Sci. Rep. Niigata Univ. Ser. A, 14, 15-24. 

[22] Vigeral G. (2009)Proprietes asymptotiques des jeux repetes a somme nulle, PhD Thesis, UPMC-Paris 6. 

[23] Vigeral G. (2010) Evolution equations in discrete and continuous time for nou expansive operators in Banach 
spaces, ESAIM COCV, 16, 809-832. 

[24] Vigeral G. (2013) A zero-sum stochastic game with compact action sets and no asymptotic value. Dynamic 
Games and Applications, 3, 172-186. 

[25] Zachrisson L.E. (1964) Markov Games, Advances in Game Theory, Dresher M., L. S. Shapley and A.W. 
Tucker (eds), Aunals of Mathematical Studies, 52, Princetou University Press, 210-253. 

[26] Ziliotto B. (2013) Zero-sum repeated games: counterexamples to the existence of the asymptotic value and 
the conjecture maxmin = limv„, preprint hal-00824039, to appear in Annals of Probability. 



16 


SYLVAIN SORIN AND GUILLAUME VIGERAL 


Sylvain Sorin, Sorbonne Universites, UPMC Univ Paris 06, Institut de Mathematiques de Jussieu- 
Paris Rive Gauche, UMR 7586, CNRS, Univ Paris Diderot, Sorbonne Paris Cite, F-75005, Paris, 
France 

E-mail address'. sylvain.sorin@imj-prg.fr 
http: //webusers . imj-prg. fr/^sylvain. sorin/ 

Guillaume Vigeral, Universite Paris-Dauphine, CEREMADE, Place du Marechal De Lattre de 
Tassigny. 75775 Paris cedex 16, France 

E-mail address: vigeral@ceremade.dauphine.fr 

http: //www. ceremade . dauphine. f r/~vigeral/indexenglish. html 


